Methods for interpreting results of various high-throughput omics studies.
Most omics analyses result in a list of genes (or proteins, metabolites, etc. that are associated with a gene) that are differentially expressed.
For example, comparing protein levels between two groups.
How do we understand a list of differentially expressed genes in biological context?
Functional enrichment terminology
Gene set: an unordered collection of functionally related genes (a pathway)1
Gene ontology (GO): a formal representation of three aspects of biological knowledge2
Molecular function
E.g., “catalysis” or “transport”
Cellular component
Either cellular compartments (e.g. “mitochondrion”) or stable macromolecular complexes (e.g. “ribosome”)
Functional enrichment terminology
Gene ontology (continued):
Biological processes
Larger processes made up of multiple molecular functions.
E.g., “signal transduction” or “glucose membrane transport”
Not necessarily equivalent to a pathway
GO graph
Organized as a directed acyclic graph (DAG)
KEGG: Kyoto Encyclopedia of Genes and Genomes
A “manually curated database resource integrating various biological objects categorized into systems, genomic, chemical and health information.”kanehisaKEGGTaxonomybasedAnalysis2023?
m =matrix(c(50,1000-50,260-50,10000-1000-260+50),nr=2)fisher.test(m,alternative ="g")$p.value
[1] 3.825066e-06
ORA problems
Does not automatically account for the direction of differential expression
Can look at down-regulated and up-regulated genes separately.
Also does not take effect size into account
Will miss small but coordinated changes across lots of genes
These are likely more biologically relevant.
Gene set enrichment analysis (GSEA)
Basically, the goal is to test whether members of a gene set \(S\) are distributed randomly throughout a gene list \(L\).subramanianGeneSetEnrichment2005?
An enrichment score (ES) is calculated by:
Walking down the gene list \(L\) (usually ranked by effect size/correlation with the phenotype).
When a gene is in set \(S\), the ES increases, and decreases when not in \(S\).
Gene set enrichment analysis (GSEA)
The final ES is the maximum deviation from 0, and corresponds to a weighted Kolmogorov–Smirnov-like statistic.subramanianGeneSetEnrichment2005?
The statistical significance of the ES is estimated using an empirical phenotype-based permutation test. - Shuffling the phenotype preserves gene-gene correlations, and is better than shuffling gene labels.
clusterProfiler
Essentially, a set of wrapper functions that simplify functional enrichment analyses.